Goto

Collaborating Authors

 tactile feedback


Learning Dexterous Manipulation Skills from Imperfect Simulations

Hsieh, Elvis, Hsieh, Wen-Han, Wang, Yen-Jen, Lin, Toru, Malik, Jitendra, Sreenath, Koushil, Qi, Haozhi

arXiv.org Artificial Intelligence

Figure 1: We propose DexScrew, a sim-to-real framework for learning dexterous manipulation skills when the environment cannot be accurately simulated. In simulation, we use simplified objects to learn transferable rotational skills, which are then used to collect data and train tactile policies in the real world. We demonstrate the framework on contact-rich screwdriving (top row) and nut-bolt fastening (middle row). We also show generalization across different objects (bottom row). More videos and code are available on https://dexscrew.github.io. Abstract-- Reinforcement learning and sim-to-real transfer have made significant progress in dexterous manipulation. However, progress remains limited by the difficulty of simulating complex contact dynamics and multisensory signals, especially tactile feedback. In this work, we propose DexScrew, a sim-to-real framework that addresses these limitations and demonstrates its effectiveness on nut-bolt fastening and screwdriving with multi-fingered hands. The framework has three stages. First, we train reinforcement learning policies in simulation using simplified object models that lead to the emergence of correct finger gaits. We then use the learned policy as a skill primitive within a teleoperation system to collect real-world demonstrations that contain tactile and proprioceptive information. Finally, we train a behavior cloning policy that incorporates tactile sensing and show that it generalizes to nuts and screwdrivers with diverse geometries. Experiments across both tasks show high task progress ratios compared to direct sim-to-real transfer and robust performance even on unseen object shapes and under external perturbations.


Tactile-Conditioned Diffusion Policy for Force-Aware Robotic Manipulation

Helmut, Erik, Funk, Niklas, Schneider, Tim, de Farias, Cristiana, Peters, Jan

arXiv.org Artificial Intelligence

Contact-rich manipulation depends on applying the correct grasp forces throughout the manipulation task, especially when handling fragile or deformable objects. Most existing imitation learning approaches often treat visuotactile feedback only as an additional observation, leaving applied forces as an uncontrolled consequence of gripper commands. In this work, we present Force-Aware Robotic Manipulation (FARM), an imitation learning framework that integrates high-dimensional tactile data to infer tactile-conditioned force signals, which in turn define a matching force-based action space. We collect human demonstrations using a modified version of the handheld Universal Manipulation Interface (UMI) gripper that integrates a GelSight Mini visual tactile sensor. For deploying the learned policies, we developed an actuated variant of the UMI gripper with geometry matching our handheld version. During policy rollouts, the proposed FARM diffusion policy jointly predicts robot pose, grip width, and grip force. FARM outperforms several baselines across three tasks with distinct force requirements -- high-force, low-force, and dynamic force adaptation -- demonstrating the advantages of its two key components: leveraging force-grounded, high-dimensional tactile observations and a force-based control space. The codebase and design files are open-sourced and available at https://tactile-farm.github.io .


FreeTacMan: Robot-free Visuo-Tactile Data Collection System for Contact-rich Manipulation

Wu, Longyan, Yu, Checheng, Ren, Jieji, Chen, Li, Jiang, Yufei, Huang, Ran, Gu, Guoying, Li, Hongyang

arXiv.org Artificial Intelligence

Enabling robots with contact-rich manipulation remains a pivotal challenge in robot learning, which is substantially hindered by the data collection gap, including its inefficiency and limited sensor setup. While prior work has explored handheld paradigms, their rod-based mechanical structures remain rigid and unintuitive, providing limited tactile feedback and posing challenges for human operators. Motivated by the dexterity and force feedback of human motion, we propose FreeTacMan, a human-centric and robot-free data collection system for accurate and efficient robot manipulation. Concretely, we design a wearable gripper with dual visuo-tactile sensors for data collection, which can be worn by human fingers for intuitive control. A high-precision optical tracking system is introduced to capture end-effector poses while synchronizing visual and tactile feedback simultaneously. We leverage FreeTacMan to collect a large-scale multimodal dataset, comprising over 3000k paired visual-tactile images with end-effector poses, 10k demonstration trajectories across 50 diverse contact-rich manipulation tasks. FreeTacMan achieves multiple improvements in data collection performance compared to prior works, and enables effective policy learning for contact-rich manipulation tasks with self-collected dataset. The full suite of hardware specifications and the dataset will be released to facilitate reproducibility and support research in visuo-tactile manipulation.


DeepXPalm: Tilt and Position Rendering using Palm-worn Haptic Display and CNN-based Tactile Pattern Recognition

Miguel, Altamirano Cabrera, Oleg, Sautenkov, Jonathan, Tirado, Aleksey, Fedoseev, Pavel, Kopanev, Hiroyuki, Kajimoto, Dzmitry, Tsetserukou

arXiv.org Artificial Intelligence

Telemanipulation of deformable objects requires high precision and dexterity from the users, which can be increased by kinesthetic and tactile feedback. However, the object shape can change dynamically, causing ambiguous perception of its alignment and hence errors in the robot positioning. Therefore, the tilt angle and position classification problem has to be solved to present a clear tactile pattern to the user. This work presents a telemanipulation system for plastic pipettes consisting of a multi-contact haptic device LinkGlide to deliver haptic feedback at the users' palm and two tactile sensors array embedded in the 2-finger Robotiq gripper. We propose a novel approach based on Convolutional Neural Networks (CNN) to detect the tilt and position while grasping deformable objects. The CNN generates a mask based on recognized tilt and position data to render further multi-contact tactile stimuli provided to the user during the telemanipulation. The study has shown that using the CNN algorithm and the preset mask, tilt, and position recognition by users is increased from 9.67% using the direct data to 82.5%.


TacRefineNet: Tactile-Only Grasp Refinement Between Arbitrary In-Hand Object Poses

Wang, Shuaijun, Zhou, Haoran, Xiang, Diyun, You, Yangwei

arXiv.org Artificial Intelligence

Abstract--Despite progress in both traditional dexterous grasping pipelines and recent Vision-Language-Action (VLA) approaches, the grasp execution stage remains prone to pose inaccuracies, especially in long-horizon tasks, which undermines overall performance. T o address this "last-mile" challenge, we propose T acRefineNet, a tactile-only framework that achieves fine in-hand pose refinement of known objects in arbitrary target poses using multi-finger fingertip sensing. Our method iteratively adjusts the end-effector pose based on tactile feedback, aligning the object to the desired configuration. We design a multi-branch policy network that fuses tactile inputs from multiple fingers along with proprioception to predict precise control updates. T o train this policy, we combine large-scale simulated data from a physics-based tactile model in MuJoCo with real-world data collected from a physical system. Comparative experiments show that pretraining on simulated data and fine-tuning with a small amount of real data significantly improves performance over simulation-only training. T o our knowledge, this is the first method to enable arbitrary in-hand pose refinement via multi-finger tactile sensing alone. Project website is available at https://sites.google.com/view/tacrefinenet


VLA-Touch: Enhancing Vision-Language-Action Models with Dual-Level Tactile Feedback

Bi, Jianxin, Ma, Kevin Yuchen, Hao, Ce, Shou, Mike Zheng, Soh, Harold

arXiv.org Artificial Intelligence

Tactile feedback is generally recognized to be crucial for effective interaction with the physical world. However, state-of-the-art Vision-Language-Action (VLA) models lack the ability to interpret and use tactile signals, limiting their effectiveness in contact-rich tasks. Incorporating tactile feedback into these systems is challenging due to the absence of large multi-modal datasets. We present VLA-Touch, an approach that enhances generalist robot policies with tactile sensing \emph{without fine-tuning} the base VLA. Our method introduces two key innovations: (1) a pipeline that leverages a pretrained tactile-language model that provides semantic tactile feedback for high-level task planning, and (2) a diffusion-based controller that refines VLA-generated actions with tactile signals for contact-rich manipulation. Through real-world experiments, we demonstrate that our dual-level integration of tactile feedback improves task planning efficiency while enhancing execution precision. Code is open-sourced at \href{https://github.com/jxbi1010/VLA-Touch}{this URL}.


Tactile-VLA: Unlocking Vision-Language-Action Model's Physical Knowledge for Tactile Generalization

Huang, Jialei, Wang, Shuo, Lin, Fanqi, Hu, Yihang, Wen, Chuan, Gao, Yang

arXiv.org Artificial Intelligence

Vision-Language-Action (VLA) models have shown remarkable achievements, driven by the rich implicit knowledge of their vision-language components. However, achieving generalist robotic agents demands precise grounding into physical interactions, especially in contact-rich scenarios where fine-grained force control is essential. We advance VLAs' implicit knowledge beyond identifying what to do, towards guiding how to physically interact with real world. This paper introduces Tactile-VLA, a novel framework that deeply fuses vision, language, action, and tactile sensing. This framework incorporates a hybrid position-force controller to translate the model's intentions into precise physical actions and a reasoning module that allows the robot to adapt its strategy based on tactile feedback. Experiments demonstrate Tactile-VLA's effectiveness and generalizability in three key aspects: (1) enabling tactile-aware instruction following, (2) utilizing tactile-relevant commonsense, and (3) facilitating adaptive tactile-involved reasoning. A key finding is that the VLM's prior knowledge already contains semantic understanding of physical interaction; by connecting it to the robot's tactile sensors with only a few demonstrations, we can activate this prior knowledge to achieve zero-shot generalization in contact-rich tasks.


Situated Haptic Interaction: Exploring the Role of Context in Affective Perception of Robotic Touch

Ren, Qiaoqiao, Belpaeme, Tony

arXiv.org Artificial Intelligence

Affective interaction is not merely about recognizing emotions; it is an embodied, situated process shaped by context and co-created through interaction. In affective computing, the role of haptic feedback within dynamic emotional exchanges remains underexplored. This study investigates how situational emotional cues influence the perception and interpretation of haptic signals given by a robot. In a controlled experiment, 32 participants watched video scenarios in which a robot experienced either positive actions (such as being kissed), negative actions (such as being slapped) or neutral actions. After each video, the robot conveyed its emotional response through haptic communication, delivered via a wearable vibration sleeve worn by the participant. Participants rated the robot's emotional state-its valence (positive or negative) and arousal (intensity)-based on the video, the haptic feedback, and the combination of the two. The study reveals a dynamic interplay between visual context and touch. Participants' interpretation of haptic feedback was strongly shaped by the emotional context of the video, with visual context often overriding the perceived valence of the haptic signal. Negative haptic cues amplified the perceived valence of the interaction, while positive cues softened it. Furthermore, haptics override the participants' perception of arousal of the video. Together, these results offer insights into how situated haptic feedback can enrich affective human-robot interaction, pointing toward more nuanced and embodied approaches to emotional communication with machines.


PP-Tac: Paper Picking Using Tactile Feedback in Dexterous Robotic Hands

Lin, Pei, Huang, Yuzhe, Li, Wanlin, Ma, Jianpeng, Xiao, Chenxi, Jiao, Ziyuan

arXiv.org Artificial Intelligence

Robots are increasingly envisioned as human companions, assisting with everyday tasks that often involve manipulating deformable objects. Although recent advances in robotic hardware and embodied AI have expanded their capabilities, current systems still struggle with handling thin, flat, and deformable objects such as paper and fabric. This limitation arises from the lack of suitable perception techniques for robust state estimation under diverse object appearances, as well as the absence of planning techniques for generating appropriate grasp motions. To bridge these gaps, this paper introduces PP-Tac, a robotic system for picking up paper-like objects. PP-Tac features a multi-fingered robotic hand with high-resolution omnidirectional tactile sensors \sensorname. This hardware configuration enables real-time slip detection and online frictional force control that mitigates such slips. Furthermore, grasp motion generation is achieved through a trajectory synthesis pipeline, which first constructs a dataset of finger's pinching motions. Based on this dataset, a diffusion-based policy is trained to control the hand-arm robotic system. Experiments demonstrate that PP-Tac can effectively grasp paper-like objects of varying material, thickness, and stiffness, achieving an overall success rate of 87.5\%. To our knowledge, this work is the first attempt to grasp paper-like deformable objects using a tactile dexterous hand. Our project webpage can be found at: https://peilin-666.github.io/projects/PP-Tac/


Whole-body Multi-contact Motion Control for Humanoid Robots Based on Distributed Tactile Sensors

Murooka, Masaki, Fukumitsu, Kensuke, Hamze, Marwan, Morisawa, Mitsuharu, Kaminaga, Hiroshi, Kanehiro, Fumio, Yoshida, Eiichi

arXiv.org Artificial Intelligence

--T o enable humanoid robots to work robustly in confined environments, multi-contact motion that makes contacts not only at extremities, such as hands and feet, but also at intermediate areas of the limbs, such as knees and elbows, is essential. We develop a method to realize such whole-body multi-contact motion involving contacts at intermediate areas by a humanoid robot. Deformable sheet-shaped distributed tactile sensors are mounted on the surface of the robot's limbs to measure the contact force without significantly changing the robot body shape. The multi-contact motion controller developed earlier, which is dedicated to contact at extremities, is extended to handle contact at intermediate areas, and the robot motion is stabilized by feedback control using not only force/torque sensors but also distributed tactile sensors. Through verification on dynamics simulations, we show that the developed tactile feedback improves the stability of whole-body multi-contact motion against disturbances and environmental errors. Furthermore, the life-sized humanoid RHP Kaleido demonstrates whole-body multi-contact motions, such as stepping forward while supporting the body with forearm contact and balancing in a sitting posture with thigh contacts. UMANOID robots are expected to realize various manipulation and locomotion tasks to support or replace humans.